High Performance Cache Architectures to Support Dynamic Superscalar Microprocessors
نویسندگان
چکیده
Simple cache structures are not sufficient to provide the memory bandwidth needed by a dynamic superscalar computer, so more sophisticated memory hierarchies such as non-blocking and pipelined caches are required. To provide direction for the designers of modern high performance microprocessors, we investigate the performance tradeoffs of the combinations of cache size, blocking and non-blocking caches, and pipeline depth of caches within the memory subsystem of a dynamic superscalar processor for integer applications. The results show that the dynamic superscalar processor can hide about two-thirds of the additional latency of two and three pipelined caches, and that a non-blocking cache is always beneficial. A pipelined cache will only outperform a non-pipelined cache if the miss penalty and miss rates are large.
منابع مشابه
Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors Kenneth
The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi-ported caches are costly to implement. In this paper we propose techniques for improving the bandwidth of a single cache port by using additional buffering in the processor, and by taking maximum advantage of a wider cache port. We evaluate these techniques ...
متن کاملMultiple Branch Prediction for Wide - Issue Superscalar ∗
Modern micro-architectures employ superscalar techniques to enhance system performance. Since the superscalar microprocessors must fetch at least one instruction cache line at a time to support high issue rate and large amount speculative executions. There are cases that multiple branches are often encountered in one cycle. And in practical implementation this would cause serious problem while ...
متن کاملProcess Prefetching for a Simultaneous Multithreaded Architecture
Traditional superscalar architectures shall eventually prove incapable of taking full advantage of billions of transistors to be available in the future generations of microprocessors if they remain limited by dataflow dependencies. Thus, SMT (Simultaneous Multithreaded) architecture may be a possible solution to this problem, as far as it can fetch and execute a great deal of instruction flows...
متن کاملEvaluation of dynamic branch predictors for modern ILP processors
Modern instruction-level parallel (ILP) processors use superscalar architectures with deep pipelines in order to execute multiple instructions per cycle. The frequency and behavior of branch instructions seriously hinder performance of ILP processors. Various mechanisms, both at the compiler, as well as the processor level, have been proposed to predict the branch behavior. This work investigat...
متن کاملA complexity-effective microprocessor design with decoupled dispatch queues and prefetching
Continuing demands for high degrees of Instruction Level Parallelism (ILP) require large dispatch queues (or centralized reservation stations) in modern superscalar microprocessors. However, such large dispatch queues are inevitably accompanied by high circuit complexity which would correspondingly limit the pipeline clock rates. In other words, increasing the size of the dispatch queue ultimat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998